Read PX statbanks

The easiest way to read data to R, is to use a relevant statbank and find the data you need. Do calculations and pivot your data to a presentable table. When ready, save your query and store the query-id somewhere safe. The table can be imported to your r-script with:

sq_data <- read_csv(“https://bank.stat.gl:443/sq/< query-id >.csv”,locale = locale(encoding = “latin1”))

Out-of-the-box Pxweb offers information on selected variables/values in a saved query by adding ‘?select’. But no information on added editing. Also if the metadata in the table, the saved query is based upon, has changed, Pxweb often encounters an error, with no help offered.

Query-id example:
https://bank.stat.gl:443/sq/8fb0941c-3579-4848-a488-6a9afe4266ff

With error:
https://bank.stat.gl:443/sq/8fb0941c-3579-4848-a488-6a9afe42lars

StatGreenland has added a simple ‘sqget’-asp function to bank.stat.gl, that allow one to get information on any existing saved query:

https://bank.stat.gl/sqget.asp?8fb0941c-3579-4848-a488-6a9afe42lars

##########################################
# Full code for sqget.asp, stored in pxweb-root folder
#
##########################################
# <%@ Language=VBScript %>
# ' utf8.asp Written by Hans Kalle for Fisz
# <!--#include file="utf8.asp"-->
# <%
# on error resume next
# 
# callstring= split(request.servervariables("QUERY_STRING"),"/")
# teststring= ucase(callstring(UBound(callstring)))
# 
# Dim objStream, uttext
# Set objStream = CreateObject("ADODB.Stream")
#     objStream.CharSet = "utf-8"
#     objStream.Open
#     objStream.LoadFromFile(Server.Mappath("/App_Data/Queries/" & testString & ".pxsq"))
#     response.write objStream.ReadText()
# objStream.Close
# %>

for more control

For more control and deeper integration Pxweb-based statbanks offers a standard api to be consumed by many machine-languages. We now turn to R:

By october 2021 there are two free R resources to read pxweb-based statbanks via api. (‘pxR’ reads local pcaxis-files only. Denmark and Ireland does not use pxweb-out-of-the-box)

https://cran.r-project.org (package repository):
pxweb
PxWebApiData (SSB.no)
csodata (only CSO.ie)

Github:
statgl (stat.gl)
dkstat (only DST.dk)

#====================================================================================
# General purpose & install
# 
#====================================================================================



library(tidyverse)

## Install or update packages on cran:
# install.packages("pxweb")
# install.packages("PxWebApiData")
# install.packages("csodata")


## Install or update packages on GitHub:
# if(!require("devtools")) install.packages("devtools")
# library("devtools")
#
# install_github("rOpenGov/dkstat") 
# devtools::install_github("StatisticsGreenland/statgl")



Example 1: pxweb (cran)

Magnusson M, Kainu M, Huovari J, Lahti L (2019). “pxweb: R tools for PX-WEB API.”

General interface to all pxweb based Statbanks.
Last updated 2021-10-09

Highlight: Use pxweb_interactive to find relevant table(s) from one of 28 Statbanks and have ready to run r-script generated

In example 1 the pxweb package is used to get data from bank.stat.gl

#====================================================================================
# Example 1: pxweb (cran)
# Magnusson M, Kainu M, Huovari J, Lahti L (2019). “pxweb: R tools for PX-WEB API.”
#====================================================================================

library(pxweb)

#pxweb_interactive()

data_df_pxweb <- pxweb_get_data(url = "https://bank.stat.gl:443/api/v1/en/Greenland/BE/BE80/BEXCALC.PX",
                          query =   list("year of birth" = "*",
                                         gender = c("M", "K"),
                                         "triangles(Lexis)" = "*",
                                         event = "*",
                                         time = "*"))



Example 2: PxWebApiData (cran)

Statistics Norway, Øyvind Langsrud <oyl at ssb.no>

General interface to all pxweb based Statbanks.
Last updated 2021-10-11

In example 2 the PxWebApiData package is used to get data from statbank.hagstova.fo

#====================================================================================
# Example 2: PxWebApiData (cran)
# Statistics Norway, Øyvind Langsrud <oyl at ssb.no> 
#====================================================================================

library(PxWebApiData)

meta <- ApiData("https://statbank.hagstova.fo:443/api/v1/en/H2/DEV/COH/Lexis.px", 
                     returnMetaFrames = TRUE)

names(meta)
## [1] "year of birth"    "event"            "sex"              "Triangles(Lexis)"
## [5] "year"
meta[[2]]$values
## [1] "P" "B" "I" "O" "D" "C" "U"
data <- PxWebApiData::ApiData("https://statbank.hagstova.fo:443/api/v1/en/H2/DEV/COH/Lexis.px",
                              "year of birth" = TRUE,
                              sex = c("M", "F"),
                              "Triangles(Lexis)" = c("0", "1"),
                              event = TRUE,
                              year = TRUE # top3 : 3i instead of TRUE
) 

data_df_PxWebApiData <- data[[1]] # Extract the first list element, which contains full variable names.

head(data_df_PxWebApiData,5)
##   year of birth                      event   sex Triangles(Lexis) year value
## 1          1885 Population (start of year) Males            Upper 1985     1
## 2          1885 Population (start of year) Males            Upper 1986     1
## 3          1885 Population (start of year) Males            Upper 1987     0
## 4          1885 Population (start of year) Males            Upper 1988     0
## 5          1885 Population (start of year) Males            Upper 1989     0



Example 3: statgl (GitHub)

Statistics Greenland - https://github.com/StatisticsGreenland/statgl

General interface to all pxweb based Statbanks.
Last updated 2021-01-04

the statgl-package bundles pxweb-based statbank functionality with presentation features, used by Statistics Greenland.

In example 3 the statgl package is used to get data from statbank.hagstova.fo and also 2 Greenlandic example to show additional features

Sustainable Development Goals

#====================================================================================
# Example 3: statgl (GitHub)
# Statistics Greenland - https://github.com/StatisticsGreenland/statgl
#====================================================================================

library(statgl)

#statgl_search("Population")

data_df_statgl <- statgl_fetch("https://statbank.hagstova.fo:443/api/v1/en/H2/DEV/COH/Lexis.px", 
                           "year of birth" = px_all(),
                           sex = c("M", "F"),
                           "Triangles(Lexis)" = c("0", "1"),
                           event = px_all(),
                           year = px_all(), # px_top(3)
                           .val_code=TRUE) 


CONST_statbank <- "https://bank.stat.gl/api/v1/en/Greenland"

  statgl_url("BEXCALCR", api_url = CONST_statbank) %>%
  statgl_fetch(area = c("NUK"),
    event            = c("P"),
    gender           = px_all(),
    time             = px_top(7),
    .eliminate_rest    = TRUE ,
    .col_code          = TRUE,
    .val_code          = FALSE
  ) %>% 
   select(-event) %>% 
   pivot_wider(names_from = time,values_from = value) %>% 
   statgl_table()
area gender 2015 2016 2017 2018 2019 2020 2021
  1. Nuuk City
Total 16.992 17.316 17.600 17.796 17.984 18.326 18.800
  1. Nuuk City
Female 8.070 8.183 8.334 8.437 8.533 8.703 8.903
  1. Nuuk City
Male 8.922 9.133 9.266 9.359 9.451 9.623 9.897
# https://stat.gl/dialog/mainTheme.asp?lang=en&sc=VM&tname=t1

  library(lubridate)
  
# Import
INXIU101_raw <-
  statgl_url("INXIU101", lang = "en") %>%
  statgl_fetch(
    indicator = 2:4,
    time      = px_all(),
    .col_code = TRUE
    ) %>% 
  as_tibble()

# Transform
INXIU101 <-
  INXIU101_raw %>% 
  mutate(
    time = time %>% make_date(),
    indicator = indicator %>% as.factor() %>% fct_rev()
    )

# Plot
INXIU101 %>% 
  ggplot(aes(
    x    = time,
    y    = value,
    fill = indicator
    )) +
  geom_area(position = "identity") +
  scale_y_continuous(labels  = scales::percent_format(
    scale        = 1,
    accuracy     = 1.1,
    big.mark     = ".",
    decimal.mark = ","
    )) +
  theme_statgl() +
  scale_fill_statgl(reverse = TRUE, guide = guide_legend(reverse = TRUE)) +
  labs(
    title   = "At-risk-of-povery rate",
    x       = " ",
    y       = " "
    )



Example 4: dkstat

https://github.com/rOpenGov/dkstat

Statbank Denmark specific

#====================================================================================
# Example 4: dkstat
# https://github.com/rOpenGov/dkstat
#====================================================================================

library(dkstat)

dkstat::dst_search("Population", lang="en") %>% head(5)
##       id                                       text   unit             updated
## 1 FOLK1A Population at the first day of the quarter Number 2021-08-11T08:00:00
## 2  FOLK3                      Population 1. January Number 2021-02-11T08:00:00
## 3   BEF5                      Population 1. January Number 2021-02-11T08:00:00
## 4     FT       Population figures from the censuses Number 2021-02-11T08:00:00
## 5    BY1                      Population 1. January Number 2021-04-29T08:00:00
##   firstPeriod latestPeriod active
## 1      2008Q1       2021Q3   TRUE
## 2        2008         2021   TRUE
## 3        1990         2021   TRUE
## 4        1769         2021   TRUE
## 5        2010         2021   TRUE
##                                        variables
## 1         region, sex, age, marital status, time
## 2 day of birth, birth month, year of birth, time
## 3               sex, age, country of birth, time
## 4                            national part, time
## 5          urban and rural areas, age, sex, time
folk1a_meta <- dst_meta("folk1a", lang = "da")

folk1a_meta[[1]]
## $id
## [1] "FOLK1A"
## 
## $text
## [1] "Folketal den 1. i kvartalet"
## 
## $description
## [1] "Folketal den 1. i kvartalet efter område, køn, alder, civilstand og tid"
## 
## $unit
## [1] "Antal"
## 
## $updated
## [1] "2021-08-11T08:00:00"
## 
## $footnote
## NULL
folk1a_meta[[2]]
##           id       text elimination
## 1     OMRÅDE     område        TRUE
## 2        KØN        køn        TRUE
## 3      ALDER      alder        TRUE
## 4 CIVILSTAND civilstand        TRUE
## 5        Tid        tid       FALSE
folk1a_meta[[3]]$KØN
##    id    text
## 1 TOT   I alt
## 2   1    Mænd
## 3   2 Kvinder
data_df_dkstat <- str(dst_get_data(table = "folk1a", 
                 OMRÅDE = c("Hele landet", "København", "Frederiksberg", "Odense"), 
                 CIVILSTAND  = "*", 
                 TID = "*", 
                 lang = "da", 
                 meta_data = folk1a_meta))
## 'data.frame':    1100 obs. of  4 variables:
##  $ OMRÅDE    : chr  "Hele landet" "Hele landet" "Hele landet" "Hele landet" ...
##  $ CIVILSTAND: chr  "I alt" "I alt" "I alt" "I alt" ...
##  $ TID       : Date, format: "2008-01-01" "2008-04-01" ...
##  $ value     : int  5475791 5482266 5489022 5505995 5511451 5515287 5519441 5532531 5534738 5540241 ...



Example 5: csodata (cran)

Conor Crowley <conor.crowley at cso.ie>

Statbank Ireland specific

#====================================================================================
# Example 5: csodata (cran)
# Conor Crowley <conor.crowley at cso.ie>
#====================================================================================

library(csodata)

toc <- cso_get_toc()
head(toc)
##          LastModified
## 1 2021-10-20 00:20:00
## 2 2021-10-20 00:20:00
## 3 2021-10-20 00:20:00
## 4 2021-10-20 00:20:00
## 5 2021-10-20 00:20:00
## 6 2021-10-20 00:20:00
##                                                                   title    id
## 1 Roscommon Population by Private Households, Occupied and Vacancy Rate CD168
## 2     Sligo Population by Private Households, Occupied and Vacancy Rate CD169
## 3 Tipperary Population by Private Households, Occupied and Vacancy Rate CD170
## 4 Waterford Population by Private Households, Occupied and Vacancy Rate CD171
## 5 Westmeath Population by Private Households, Occupied and Vacancy Rate CD172
## 6   Wexford Population by Private Households, Occupied and Vacancy Rate CD173
population <- cso_search_toc("Population")

tbl1 <- cso_get_data("PEB07")

meta1 <- cso_get_meta("PEA19") %>% as_tibble()
cso_disp_meta("PEA19")
## [1] "Year"                   "Sex"                    "Education Level"       
## [4] "Inward or Outward Flow"
## [1] "Estimated Migration Aged 15 Years and Over  (Persons in April)"
# data_df_cso <- statgl_fetch(url = "https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.PxAPIv1/en/17/PME/PEA21", 
#                            Year = px_all(),
#                            sex = px_all(),
#                            Nationality = px_all(),
#                            .val_code=TRUE) 
# 
data_df_cso <- pxweb_get_data(url = "https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.PxAPIv1/en/17/PME/PEA21",
                          query =   list(Year = "*",
                                         sex = "*",
                                         Nationality = "*"))
## Error: 
## This is not a PXWEB API: 
## https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.PxAPIv1/en/17/PME/PEA21



Example 6: pxR (cran)

Carlos J. Gil Bellosta <cgb at datanalytics.com>

Read PX-files to R
Last updated 2020-06-07

#====================================================================================
# Example 6: pxR (cran)
# Carlos J. Gil Bellosta <cgb at datanalytics.com>
#====================================================================================

library(pxR)

# Read px-files

# library(pxR)
# 
# Reading PC-Axis files into R
# Function read.px reads a PC-Axis file from a given location and returns an object of class px containing all the data and metadata in the original PC-Axis file.
# 
# The single most important piece of information within a pxobject is the data matrix, which can be extracted into a R data.frame using function as.data.frame. For instance,
# 
# my.px.object <- read.px("/path/to/pc-axis/file")
# my.px.data   <-  as.data.frame(my.px.object)
# will create the data.frame my.px.data with the data in the corresponding PC-Axis file.